The Central Hub: master_isin_map.json
Every data flow in the pipeline begins or depends on the master ISIN map created in Phase 1.master_isin_map.json Structure
master_isin_map.json Structure
- ISIN: International Securities Identification Number (used by ALL APIs)
- Sid: Security ID (required for OHLCV and advanced indicators)
- Symbol: Stock ticker (used for file naming and CSV matching)
- Name: Company full name
Why It’s Critical: Every script in Phase 2+ iterates over this map to:
- Know which stocks to fetch data for
- Match API responses back to symbols
- Ensure consistent ISIN → Symbol mapping across all datasets
Phase-by-Phase Data Transformation
Phase 1: Core Data Foundation
1. Market Snapshot
Script: Raw Output:
fetch_dhan_data.pyAPI Call:dhan_data_response.json (~5 MB)- 2,775 stocks with current prices, technical indicators, volume
master_isin_map.json- Extracted ISIN, Sid, Symbol, Name for all stocks
2. Fundamental Data
Script: Output:
fetch_fundamental_data.pyAPI Calls: One per stock (2,775 requests)fundamental_data.json (~35 MB)- Quarterly results (Net Profit, EPS, Sales, OPM)
- Annual results (5 years history)
- Balance sheet data
- Shareholding patterns
- Valuation ratios (ROE, ROCE, P/E)
- 2,775 ISINs mapped to symbols
- Current market data (prices, volumes, RSI)
- 5 years of quarterly fundamentals
- Listing dates
Phase 2: Data Enrichment (Parallel Fetching)
All scripts in this phase run independently usingmaster_isin_map.json. They can execute in any order (or in parallel).
- Company Filings
- Market News
- Advanced Indicators
- Corporate Actions
- Other Fetchers
fetch_company_filings.py
Strategy: Hybrid dual-endpoint fetchingAPI Calls: 2 per stock × 2,775 = 5,550 requestsDeduplication Logic:- By
news_id+news_date+caption - Keeps most recent 100 filings per stock
- 100 regulatory filings per stock
- 50 news items per stock
- Technical indicators (Pivots, SMA/EMA)
- 2 years corporate action history + 2 months upcoming
- Surveillance flags
- Circuit breaker status
- Bulk/block deals
- Price band revisions
Phase 2.5: OHLCV Data (Incremental Download)
fetch_all_ohlcv.py Flow
fetch_all_ohlcv.py Flow
API Call (per stock):Smart Incremental Logic:
- Check if
ohlcv_data/RELIANCE.csvexists - If yes: Read last date, set START to last date + 1 day
- If no: Download from 1976 (full history)
- First-time: ~30 minutes (2,775 stocks × full history)
- Incremental: ~2-5 minutes (only new dates)
- Daily OHLCV data for all stocks (from listing date to today)
- ~2,775 CSV files in
ohlcv_data/directory
Phase 3: Base Analysis (Creating Master JSON)
bulk_market_analyzer.py Transformation
bulk_market_analyzer.py Transformation
Inputs:
fundamental_data.json→ Financial metricsdhan_data_response.json→ Current prices, technical indicatorsadvanced_indicator_data.json→ Pivots, SMA/EMA signalsnse_equity_list.csv→ Listing dates
-
Quarterly Metrics Extraction:
- Raw:
"NET_PROFIT": "1250.5|1180.2|1090.8|1050.3|1100.1" - Extracted:
Net Profit Latest Quarter: 1250.5Net Profit Previous Quarter: 1180.2Net Profit Last Year Quarter: 1100.1
- Calculated:
QoQ % Net Profit Latest: ((1250.5 - 1180.2) / 1180.2) × 100 = 5.96%YoY % Net Profit Latest: ((1250.5 - 1100.1) / 1100.1) × 100 = 13.67%
- Raw:
-
Valuation Ratios:
- D/E Ratio:
Non-Current Liabilities / Total Equity - PEG Ratio:
P/E / YoY EPS Growth - Forward P/E:
P/E × (TTM EPS / Annualized Latest EPS)
- D/E Ratio:
-
Shareholding Changes:
- Raw:
"FII": "25.3|24.1" - Calculated:
FII % change QoQ: 25.3 - 24.1 = 1.2% - Free Float: 100 - Promoter%
- Float Shares: Total Shares × (Free Float / 100)
- Raw:
-
Technical Indicator Parsing:
- SMA Status: “SMA 20: Above (4.9%) | SMA 50: Above (24.1%)”
- EMA Status: “EMA 20: Above (6.3%) | EMA 200: Above (72.6%)”
- Technical Sentiment: “RSI: Neutral | MACD: Bearish”
-
Index Membership:
- Filters
tech.idxlistfor specific indices (Nifty 50, Bank Nifty, etc.) - Comma-separated list: “NIFTY 50, NIFTY BANK, NIFTY 100”
- Filters
- Base JSON with 60+ fields for all 2,775 stocks
- Identity, Fundamentals, Valuation, Ownership, Technical indicators
- Ready for in-place enrichment in Phase 4
Phase 4: Enrichment Injection (Sequential Modifications)
Each script in this phase readsall_stocks_fundamental_analysis.json, modifies it in-place, and writes it back.
1. Advanced Metrics (OHLCV-based)
Script: Fields Added (15 fields):
advanced_metrics_processor.pyReads: ohlcv_data/{SYMBOL}.csv for each stockCalculations:- ATH, % from ATH
- 5/14/20/30 Days MA ADR(%)
- RVOL
- Gap Up %, Day Range %
- % from 52W Low
- 6 Month Returns(%)
- 200 Days EMA Volume
- % from 52W High 200 Days EMA Volume
- Daily Rupee Turnover 20/50/100(Cr.)
- 30 Days Average Rupee Volume(Cr.)
2. Earnings Performance
Script: Fields Added (3 fields):
process_earnings_performance.pyLogic:- Read
company_filings/{SYMBOL}_filings.json - Find most recent “Quarterly Results” filing
- Extract date and closing price on that day from OHLCV
- Calculate returns from earnings day to current price
- Find max price since earnings to calculate peak returns
- Quarterly Results Date
- Returns since Earnings(%)
- Max Returns since Earnings(%)
3. F&O Data Enrichment
Script:
enrich_fno_data.pyReads:fno_lot_sizes_cleaned.json(lot size mapping)fno_expiry_calendar.json(next expiry dates)fno_stocks_response.json(F&O stock list)
- If symbol in F&O list → set
FNO Flag: Yes - Look up lot size from mapping
- Find next expiry date from calendar
- FNO Flag (Yes/No)
- Lot Size
- Next Expiry (date)
4. Market Breadth & Relative Strength
Script:
process_market_breadth.pyCalculation:- Uses return data already in base JSON
- Computes relative strength rating (1-100)
- Generates market breadth statistics
- Relative Strength Rating
- Market breadth percentile
5. Historical Market Breadth
Script:
process_historical_market_breadth.pyOutput: Separate time-series file for charting (not added to master JSON)- Complete JSON with all 86 fields for all 2,775 stocks
- Ready for compression
Phase 5: Compression
Simple gzip compression of the final JSON:- Raw JSON: ~38 MB
- Compressed: ~7.5 MB
- Ratio: 80% reduction
Final Output Structure
Complete JSON Schema
Complete JSON Schema
Data Lineage Summary
Next Steps
Output Schema
Detailed breakdown of all 86 fields
Pipeline Architecture
Understand the 6-phase design
API Endpoints
Complete Dhan API endpoint reference
Pipeline Settings
Configure pipeline behavior and flags
OHLCV Configuration
Optimize OHLCV download strategy